Applying Finite-State Methods to the Swahili Language

نویسندگان

  • Isaiah Greene
  • Michael Wangia
  • Aravind K. Joshi
چکیده

Herein, we explore the current finite-state methods that exist for analyzing English grammar and decide whether they can be applied to the Swahili language and Swahili syntactic patterns. Further, we to explore the differences between Swahili grammar and English grammar to see if it is possible to accommodate these finite-state methods to the Swahili language. In the end, the objective is to deliver a recognition device for identifying well-formed Swahili sentences and generating new Swahili strings that have their components (i.e. noun phrase, prepositional phrases, etc) enclosed in parentheses. To the best of our knowledge, no work has ever been done on Swahili language processing. Also, the structure of Swahili sentences is substantially different than that of English sentences. One example is the fact that there are no equivalent words for the English determiners (DET), “the” and “a”, in Swahili. We hope that our work can contribute to developing better Swahili to English and English to Swahili dictionaries, translators, Swahili grammar checkers, speech recognition devices, and the like. We anticipate, our work in this field can bridge the gap between native Swahili speakers and native English speakers and lead to approaches for other dialects as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Form and Interpretation of Finite and non-Finite Verbs in Swahili

A great deal is known about the distribution of finite and non-finite forms in early language and increasingly, studies are investigating the semantic properties of these different forms. An important question for acquisition theory concerns the relationship between the child's developing morphosyntax and the semantics typically expressed by these structures. In this paper we will explore the f...

متن کامل

SYNERGY: A Named Entity Recognition System for Resource-scarce Languages such as Swahili using Online Machine Translation

Developing Named Entity Recognition (NER) for a new language using standard techniques requires collecting and annotating large training resources, which is costly and time-consuming. Consequently, for many widely spoken languages such as Swahili, there are no freely available NER systems. We present here a new technique to perform NER for new languages using online machine translation systems....

متن کامل

The morphosyntax of mood in early grammar with special reference to Swahili

In this paper we explore the development of the morphosyntax-semantics interface by comparing development in 4 typologically diverse languages: Dutch (a Germanic V2 language), Greek, Italian (a Romance pro-drop language) and Swahili (a Bantu language), with particular emphasis on Swahili, a relatively understudied language whose morphosyntactic structure is particularly relevant to the question...

متن کامل

HFST - Framework for Compiling and Applying Morphologies

HFST–Helsinki Finite-State Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finite-state methods. HFST currently connects some of the most important finite-state tools for creating morphologies and spellers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistical inf...

متن کامل

Word-Level Language Identification and Predicting Codeswitching Points in Swahili-English Language Data

Codeswitching is a very common behavior among Swahili speakers, but of the little computational work done on Swahili, none has focused on codeswitching. This paper addresses two tasks relating to Swahili-English codeswitching: word-level language identification and prediction of codeswitch points. Our two-step model achieves high accuracy at labeling the language of words using a simple feature...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010